FEAT Add ToxicChat dataset loader by romanlutz · Pull Request #1422 · Azure/PyRIT

romanlutz · 2026-03-01T14:20:13Z

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Copilot

Pull request overview

Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.

Changes:

Introduces _ToxicChatDataset remote loader that converts ToxicChat rows into SeedPrompts and skips prompts that fail template parsing.
Adds unit tests for the ToxicChat loader.
Updates dataset-loading documentation output to include toxic_chat in the available dataset list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py`	New HuggingFace-backed loader that builds `SeedDataset`/`SeedPrompt` entries from ToxicChat rows and skips problematic entries.
`tests/unit/datasets/test_toxic_chat_dataset.py`	New unit tests validating basic loading behavior and custom HF config plumbing.
`pyrit/datasets/seed_datasets/remote/__init__.py`	Exposes/imports the new loader for discovery/registration.
`doc/code/datasets/1_loading_datasets.ipynb`	Updates the displayed list of available datasets to include `toxic_chat`.

Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that split (and optionally cache) are forwarded to _fetch_from_huggingface in addition to dataset_name and config; otherwise regressions in argument plumbing won't be caught.

        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

This test configures split="test" but doesn't assert that fetch_dataset() passes split through to _fetch_from_huggingface (unlike similar tests for other datasets). Adding an assertion for call_kwargs["split"] (and optionally cache) would improve coverage of the custom-config behavior.

        loader = _ToxicChatDataset(
            dataset_name="custom/toxic-chat",
            config="custom_config",
            split="test",
        )

        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

doc/code/datasets/1_loading_datasets.ipynb

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

doc/code/datasets/1_loading_datasets.ipynb:242

The notebook metadata language_info.version changed from 3.11.14 to 3.13.5, which is typically just an artifact of the author’s local environment. To reduce churn in docs, consider keeping this stable (or stripping environment-specific metadata) when updating notebooks.

   "version": "3.13.5"

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py

doc/code/datasets/1_loading_datasets.ipynb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Raw wrapping preserves original dataset text including Jinja2-like syntax (HTML, template tags), eliminating the need to catch and skip TemplateSyntaxError entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:112

Remote dataset loaders in this repo typically wrap prompt text in Jinja2 {% raw %}...{% endraw %} before constructing a SeedPrompt so that SeedPrompt.__post_init__ doesn't render/alter prompt content and so template-like text doesn't trigger parsing errors. Here value=item["user_input"] plus catching TemplateSyntaxError means some prompts will be silently dropped and others may still be mutated by Jinja rendering. Consider wrapping user_input in raw tags (and only skipping if it still fails), and include the conv_id in the debug log for traceability.

                description=description,
                source=source_url,
                authors=authors,
                groups=groups,
                metadata={
                    "toxicity": str(item.get("toxicity", "")),
                    "jailbreaking": str(item.get("jailbreaking", "")),
                    "human_annotation": str(item.get("human_annotation", "")),
                },
            )
            for item in data
        ]

        logger.info(f"Successfully loaded {len(seed_prompts)} prompts from ToxicChat dataset")

        return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)

tests/unit/datasets/test_toxic_chat_dataset.py

doc/code/datasets/1_loading_datasets.ipynb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Copilot · 2026-03-02T15:20:39Z

tests/unit/datasets/test_toxic_chat_dataset.py

+    async def test_fetch_dataset_preserves_jinja2_content(self):
+        """Test that entries with Jinja2-like content are preserved via raw wrapping."""
+        data_with_html = [
+            {


The tests don’t exercise the advertised behavior of skipping truly Jinja2-incompatible inputs (e.g., user_input containing {% endraw %} that can break parsing even when wrapped). After adding skip/exception handling in the loader, add a unit test asserting such an entry is skipped while the rest of the dataset still loads.

Copilot · 2026-03-02T15:20:39Z

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py

+        seed_prompts = [
+            SeedPrompt(
+                value=f"{{% raw %}}{item['user_input']}{{% endraw %}}",
+                data_type="text",
+                dataset_name=self.dataset_name,


fetch_dataset builds SeedPrompt objects in a list comprehension, but SeedPrompt.__post_init__ can raise when Jinja2 cannot parse the template (e.g., if user_input contains {% endraw %} or otherwise breaks the outer {% raw %}...{% endraw %} wrapper). In that case, the entire dataset load will fail rather than “gracefully skipping” bad entries as described. Consider switching to an explicit loop that wraps SeedPrompt(...) construction in a try/except (catch ValueError / Jinja2 TemplateSyntaxError), logs a warning with conv_id, and continues.

…hat-dataset # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb

Copilot AI review requested due to automatic review settings March 1, 2026 14:20

romanlutz changed the title ~~Add ToxicChat dataset loader~~ FEAT Add ToxicChat dataset loader Mar 1, 2026

Copilot started reviewing on behalf of romanlutz March 1, 2026 14:20 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py Outdated Show resolved Hide resolved

romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch 2 times, most recently from cb93f7f to c63374c Compare March 1, 2026 14:54

Copilot AI review requested due to automatic review settings March 1, 2026 14:54

Copilot started reviewing on behalf of romanlutz March 1, 2026 14:55 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

doc/code/datasets/1_loading_datasets.ipynb Outdated Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py Outdated Show resolved Hide resolved

romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch from c63374c to 79e2bf9 Compare March 2, 2026 12:58

Remove dataset_name from constructor, hardcode as class constant

dec4d62

The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 2, 2026 13:39

Copilot started reviewing on behalf of romanlutz March 2, 2026 13:40 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py Outdated Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

romanlutz and others added 2 commits March 2, 2026 05:55

Use AsyncMock for _fetch_from_huggingface in tests

ead6d2e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Precompute source_url and groups outside the loop

38f86c8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 2, 2026 14:02

Copilot started reviewing on behalf of romanlutz March 2, 2026 14:03 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

tests/unit/datasets/test_toxic_chat_dataset.py Show resolved Hide resolved

doc/code/datasets/1_loading_datasets.ipynb Show resolved Hide resolved

romanlutz and others added 2 commits March 2, 2026 06:49

Fix ruff formatting

e812345

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add license notice and content warning to docstring

25f0115

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 2, 2026 15:11

Copilot started reviewing on behalf of romanlutz March 2, 2026 15:12 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

romanlutz added 2 commits March 2, 2026 14:45

Merge remote-tracking branch 'origin/main' into romanlutz/add-toxic-c…

2c59d7c

…hat-dataset # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb

fix: update notebook output with all dataset names

1745a46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add ToxicChat dataset loader#1422

FEAT Add ToxicChat dataset loader#1422
romanlutz wants to merge 9 commits intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants